Adverse drug reactions (ADRs) are a common problem in clinical and pharmacovigilance research and can lead to serious patient harm and biased conclusions if modelled poorly. Sparse, noisy, and highly imbalanced drug ADR data often cause standard machine learning methods to perform no better than naïve frequency‑based approaches, unless appropriate low‑rank and kernel‑based methods are used with clear assumptions [1].
Aim: To predict adverse drug reaction (ADR) profiles by integrating chemical fingerprints and drug–gene interaction using advanced statistical modeling approaches.
Figure 1: Distribution of side effects per drug. This histogram illustrates a right-skewed distribution, where the majority of drugs have a relatively low number of side effects, typically clustered between 0 and 200. As the number of side effects increases, the frequency of drugs drops significantly, with few extreme outliers reaching over 800 side effects.
Figure 2: Drugs with most side effects and most frequent side effects
Figure 3: Drug and side effects similarity. The heatmap reveals high-density clusters (dark blue) where specific drug classes exhibiting similarity in their side-effect profiles
ADR Profile Prediction Methods Using Drug–Gene Interaction Features:
1.Naïve Frequency Model - Predicts ADRs based solely on their observed prevalence in the dataset, serving as a baseline.
2.Kernel Regression (KR) - Models the relationship between drug features and ADRs using a similarity-based kernel approach.
3.Linear SVM - Classifies ADR presence using a linear hyperplane in feature space.
4.RBF-Kernel SVM - Employs a non-linear radial basis function kernel to capture complex relationships between drug features and ADRs.
5.VKR (NMF + Kernel Ridge Regression) - Combines low-rank latent factor decomposition (NMF) with kernel ridge regression to predict ADRs in sparse and imbalanced datasets.
Preliminary analysis on the figure.5 that the Naïve baseline and VKR achieve the highest AUROC (≈0.91), while KR and VKR achieve the best AUPR (≈0.41–0.42), clearly outperforming SVM variants on both metrics. VKR therefore provides the best overall trade‑off between discrimination (AUROC) and rare ADR detection (AUPR), motivating its use as the main reference method in further experiments.
Figure 4: Early Performance of ADR Prediction Methods
We would like to thank Dr.Yezhao Zhong, Dr.Cathal Seoighe , Dr.Haixuan Yang for their work in ADR prediction and sharing the code and data through the github page.
[1] Zhong, Y., Seoighe, C., & Yang, H. (2024). Non-Negative matrix factorization combined with kernel regression for the prediction of adverse drug reaction profiles. Bioinformatics Advances, 4(1), vbae009.
[2] Michael Kuhn, Ivica Letunic, Lars Juhl Jensen, Peer Bork, The SIDER database of drugs and side effects, Nucleic Acids Research, Volume 44, Issue D1, 4 January 2016, Pages D1075–D1079,
[3] Lv X, Wang W, Liu H. Cluster-Wise Weighted NMF for Hyperspectral Images Unmixing with Imbalanced Data. Remote Sensing. 2021; 13(2):268.
The code and datasets for this project can be viewed at our GitHub repository here: https://github.com/arshad4387/ADR-Prediction.git